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Abstract 


Link et al. (2010) define a general framework for analyzing capture-recapture data 


with potential misidentifications. In this framework, the observed vector of counts, y, 
is considered as a linear function of a vector of latent counts, x, such that y = Ax, with 
X assumed to follow a multinomial distribution conditional on the model parameters, 6. 
Bayesian methods are then applied by sampling from the joint posterior distribution 


of both X and 6. In particular, Link et al. (2010) propose a Metropolis-Hastings 


algorithm to sample from the full conditional distribution of x, where new proposals 
are generated by sequentially adding elements from a basis of the null space (kernel) of 
A. We consider this algorithm and show that using elements from a simple basis for the 
kernel of A may not produce an irreducible Markov chain. Instead, we require a Markov 


basis, as defined by Diaconis and Sturmfels (1998). We illustrate the importance of 


Markov bases with three capture-recapture examples. We prove that a specific lattice 
basis is a Markov basis for a class of models including the original model considered 


by Link et al. (2010) and confirm that the specific basis used by Link et al. (2010) for 


their example with two sampling occasions is a Markov basis. The constructive nature 
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of our proof provides an immediate method to obtain a Markov basis for any model in 
this class. 


1 Introduction 

The development of capture-recapture methodology has a long history, allowing estimation 


of demographic parameters of interest for animal populations (see Amstrup et al. 2005, for 


a review). Similar methods have also been used to study human populations, including 
intravenous drug users ( King et ar]|2009 ) and human rights abuse victims ( hum et aL]|2013 ). 
In general, a capture-recapture experiment consists of a series of capture occasions on which 
overlapping subsets of the population are observed. For animal populations the occasions 
are usually ordered in time while for human populations they may comprise lists obtained 
from different sources. It is assumed that each individual has a unique identifying mark that 
is either given or realized when the individual is hrst captured and this mark can be used to 
identify the individual on subsequent occasions. In this paper, we are concerned with htting 
capture-recapture models to data that provide an incomplete or inaccurate representation 
of the true encounters of individuals during the experiment. This may occur if the data 
consist of incomplete summary statistics or if individuals are misidentihed on some occasions. 
Examples of capture-recapture studies that are prone to identihcation errors include (i) multi¬ 
list studies in which individuals may be matched based on personal information such as name. 


birth date, medical record number (Seber et al.j2000 Lee et al. 2001, Sutherland and Schwarz 


2005, Fienberg and Manrique-Vallier|2009), (ii) animal studies in which individual identity 


is found from non-invasive sampling, e.g. genetic information from scat or hair (Wright et al. 


2009, Link et al. 2010, Yoshizaki et al. 2011) or photographic ID of individuals (Yoshizaki 


et al. 2009| Bonner and Holmberg 2013, McClintock et al. 2013), and (hi) studies in which 
(at least) two sources of capture-recapture information are available for the same population 
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with little to no information about how the individual IDs in one source corresponds to 


individual ID from the other sources (Bonner and Holmberg 2013, McClintock et ah 2013). 
Our focus is on the algorithm for a general class of mark-recapture models allowing for 


misidentihcation considered by Link et ah (2010) (hereafter L2010). This class is described by 
the latent multinomial model, in which an observed data vector, y can be expressed as a linear 
function of a latent data vector, a;, modeled by a multinomial distribution with unknown 
parameters 0, denoted [x\6]. The notation [x] denotes the probability density function fx{x) 
for a continuous random variable X or the probability mass function Pr(X = x) for a discrete 
random variable X. The linear function is expressed as 


y = Ax, ( 1 ) 

where A is called the conhguration matrix (a matrix of known constants that depends on 
the specihc problem) with more columns than rows. We continue to call this modeling setup 
the latent multinomial model, even though the setup is flexible and can accommodate other 
probability mass functions [x\0], such as the Poisson model considered by Lee (2002). 

The goal is to sample from the joint posterior distribution [6,x\y] using Markov chain 
Monte Carlo (MCMC) by alternating between sampling from the full conditional distribu¬ 
tions [6\x,y] and [x\y,6]. The difficulty with this approach is in specifying an updating 
scheme for x. That is, how to efficiently sample from [a;|y,0] in such a way so that every 
X vector that satishes ([^ has a positive probability of being reached at some point during 
the updating. We consider three examples demonstrating that the scheme for updating x 
proposed by L2010 may not produce an irreducible Markov chain for models within the la¬ 
tent multinomial framework. We then present theory identifying a class of models for which 
the specihc algorithm does produce irreducible Markov chains, and show more generally how 
these methods ht within the framework of algebraic statistics. This allows us to develop 


Lee (2002 
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an extension of the algorithm which can be used to generate valid MCMC samplers for the 
posterior distributions from a broader class of latent multinomial models. 

The MCMC algorithm we consider throughout this manuscript is presented in Figure [Tj 
Starting with an initial state satisfying the linear constraint, a proposal is generated on 
the hrst iteration by adding or subtracting an element chosen randomly from a subset of 
the kernel (or null space) oi A, B = {ai,a 2 ,... ,am} C ker(A), with cardinality m. The 
proposal is then accepted or rejected with probability determined by the Hasting’s ratio, r, 
and the algorithm continues to the second iteration. This algorithm is a modihcation of that 
presented by L2010, with three differences: (i) L2010 steps through all m elements in B in 
order instead of selecting an element at random on each iteration, (ii) when stepping through 
every element in B, L2010 multiplies element by a coefficient c G {—Q,..., —1,1,..., C,} 
in order to improve convergence, and (iii) L2010 assumes that H is a basis for ker(A), while 
we allow H to be a more general subset that spans ker(A). The first two differences may 
impact the efficiency of the algorithm but do not change the stationary distribution of the 
resulting Markov chains, and we do not consider these differences further. Our focus is on 
the third difference and the effect that the set B can have on the generated Markov chains 
and their stationary distributions. 


[Figure 1 about here.] 

To illustrate the problems that may occur if B is poorly specihed we consider three exam¬ 
ples of models which £t into the latent multinomial framework. First we consider the same 
closed population mark-recapture model with misidentihcation considered by L2010. This 
model, called Mt^, assumes that captures occur according to a closed population model with 
time dependent capture probabilities and that errors in identifying an individual are unique 
and create ghost histories with single captures. Second, we consider a multi-list modeling 
problem in which summary statistics are presented in place of the full data set, possibly 
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for privacy reasons. Our aim is to sample from possible complete data sets with the given 
sufficient statistics. Finally, we consider a more complicated model of misidentification in 
mark-recapture which allows for one marked individual to be identified as another previ¬ 
ously marked individual. Full details of these models and the issues regarding the selection 
of the set B to be used in the algorithm in Figure [T] are provided in sections ii and|^ As 
motivation, we consider the output from Markov chains constructed using the algorithm in 
Figure [T] for each of the three examples. For each example, we defined to be a basis for 
ker(A) as in L2010 and ran two parallel chains, each of which started from a different initial 
value. For both model Mt^ and the multi-list model with sufficient statistics, despite strong 
evidence that each chain has converged, it is clear that the two chains are not sampling from 
the same distribution for a given quantity of interest (Figure]^. This is even more apparent 
in the third example where one of the two chains never moves from its initial value. 


[Figure 2 about here.] 


The problem in all three examples is that the stationary distribution reached by the 
Markov chains produced by the algorithm in Figure [T] may depend on the chosen set, B and 
the initial value of x. Although the values of x proposed on each iteration are guaranteed 
to satisfy the linear constraint the resulting Markov chains may not reach all points in the 
sample space and the stationary distributions may be dependent on the initial values. In the 
next section we provide a basic introduction to the field of algebraic statistics and the results 


of Diaconis and Sturmfels (1998) and others who have explored approaches for sampling 


from X from a linear constraint as in ([^ in other application areas. We then consider the 
implications of this theory to show why the MCMC algorithms failed above (Figure]^, and 
how valid MCMC samplers can be constructed for each of the three examples. 
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2 Introduction to algebraic statistics 


Sampling x in the presence of the linear constraint in Q is not unique to capture-recapture 


problems. In a seminal paper in algebraic statistics, Diaconis and Sturmfels (1998) considered 
a linear constraint of the same form when developing conditional goodness-of-fit tests for 


contingency tables (see Karwa and Slavkovic 2013, for a recent review). That is, they 
considered how to construct an MCMC algorithm to sample different contingency tables 
with common (fixed) row and column sums (such ideas can also be extended to multi-way 
contingency tables). 

To consider the problem at hand in more detail we will summarize several definitions and 
results from linear algebra in this section (basic definitions regarding kernels and bases are 
provided in the supplementary materials). We will use a 3 x 3 contingency table example to 
illustrate many of the ideas. The table is 


Xii 

X12 

Xl 3 

Xi. 

X21 

X22 

X23 

X2- 

X31 

X32 

X33 

X3. 

X.i 

X.2 

X .3 



where Xij is the value in the ith row and jth column, x.j refers to the sum of the jth column 
and Xi. refers to the sum of the ith row. The column and row sums are vectorized to give 
the vector of summary statistics 


y = {x.i,X.2,X.2.,Xi.,X2.)'. 

Note that we need not include the third row sum as this is a derived quantity of the other 
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elements of y. The individual entries in the table are vectorized to give 


X — (xn, X21, x^i, Xi 2 , X22, Xis, X23, x^s)'. 


The specihcation is completed with 


fl 

1 

1 

0 

0 

0 

0 

0 

0 ^ 

0 

0 

0 

1 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

1 

1 

0 

0 

1 

0 

0 

1 

0 

0 


1 

0 

0 

1 

0 

0 

1 

0/ 


so that the constraints inherent in a contingency table follow Q. If we have column/row 
sums given by 

= (5,3, 2, 0,4)' 

then two contingency tables compatible with these constraints have entries 

a;i = (0,2, 3,0,1, 2, 0,1,1)' and = (0, 3,2, 0, 0, 3, 0,1,1)'. (2) 

Our goal is to specify an MCMC algorithm that samples from the set of vectors x that 
satisfy ([^ for a particular y. This is dehned as the y-£ber (or simply hber) J^y, 

J'y = {x e : y = Ax}, 

where d is the dimension of x and N = {0,1,..L2010 refers to Ty as the feasible set. 

To move between elements of the fiber, we make use of the lattice kernel kerz(A). The 
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lattice kernel is the integer valued subset of the kernel, 


kerz(A) = ker(A) = {x e : Ax = 0 }. 

In algebraic statistics, a move is dehned to be any element of the lattice kernel, such that 
the vector v is a move if r; G keTx{A). An implication of this is that if Xi,X 2 G J^y then 
X 2 — X 1 is a move. The idea is that the elements of the lattice kernel can be added to a vector 
that satishes the linear constraint and the result is guaranteed to still satisfy the constraint. 
However, it is not practical to consider all elements of the lattice kernel when updating x as 
ker(A) is potentially very large and difficult to compute. Instead we want to hnd a smaller 
set of moves B = {r^i,..., v^} C kerz(A) that can be used to update x. That is, we require 
a smaller set of moves so that it is possible to move between all elements of Ty using the 
algorithm in Figure 

The suggestion of L2010 was to use a basis for ker(A) for this set of moves. However, we 
do not wish to construct a basis for ker(A), but instead a lattice basis for the integer lattice 
kerz{A). A lattice basis is a set of linearly independent vectors where every v G ker^iA) 
can be found as a linear combination of the lattice basis vectors using integer coefficients. 
If we insist on using a basis for ker(A), it may not be possible to reach all solutions using 
only integer values of the coefficients, c, as specihed in the algorithm in Figure [Tj However, 
even if we choose to use a lattice basis for B it may be necessary to pass through one (or 
more) vectors containing negative elements when applying moves one at a time to transition 
between elements in the hber Ty. As vectors x containing negative elements can never be 
accepted, the use of a lattice basis for B may result in sampling from a subset of the hber 
J-y when using the algorithm in Figure This explains the observed results in the three 
examples shown in Section [T} the two chains are exploring different subsets of the hber. 

These ideas are formalized using the concept of connectivity. Elements Xj,Xk G J^y 


are connected using the set V = (vi,..., v^) if there are moves Vi G V, i G {1,..., M} 
so that we can start from Xj and add or subtract these moves one at a time to reach Xk 
without any element in any of the partial sums ever being negative (note that the elements 
Vi, i = 1,..., M need not be distinct and some elements may be repeated multiple times). 
That is, there exist ei,..., cm G { — 1,1} such that 

M L 

Xk = Xj + ejVj and Xi + SkVk G Ty, L = 1,..., M — 1. 
i=i k=l 


We then say that the hber J^y is connected by V if every pair of elements in the hber are 
connected. 

We can apply the algorithm in Figure [^to the 3x3 contingency table example using the 


elements of a lattice basis. A lattice basis can be found using the Hermite normal form (Aoki 


et ah 2012, pg. 53). Unless otherwise stated, all lattice bases provided in this manuscript 


are found using this approach. We note that the lattice basis obtained is not unique and a 
different basis is often found if one reorders the columns of A (and corresponding entries of 
x). For the contingency table, a lattice basis is given by elements LBl - LB4 in ([^ 



Xii 

3^21 

2^31 

3^12 

3^22 

3^32 

3^13 

3^23 

3^33 

LBl 

1 

-1 

0 

-1 

1 

0 

0 

0 

0 

LB2 

-1 

0 

1 

1 

0 

-1 

0 

0 

0 

LB3 

1 

-1 

0 

0 

0 

0 

-1 

1 

0 

LB4 

0 

0 

0 

1 

0 

-1 

-1 

0 

1 


If we attempt to apply any of the elements LBl — LB4 to either a;i or a ;2 in ([^ we imme¬ 
diately hnd a problem. Either adding or subtracting any of LBl - LB4 results in at least 
one negative count in the proposal and will lead to it being automatically rejected. That 
means there is no way to use the elements LBl - LB4 as moves in the algorithm in Figure 
and successfully transition between the two solutions in (|^. In fact, we are unable to move 


9 








between any two valid solutions. As a result, the lattice basis in (|^ does not connect the 
hber for this example. One solution is to change the algorithm in Figure to use elements 
of a lattice basis in a linear combination instead of one-at-a-time. While attractively simple, 


Diaconis and Sturmfels (1998) implemented this for several examples and found that it was 


inefficient and did not work well in practice. We do not consider this further. 

To overcome the shortcomings of constructing moves via integer multiples of an element 


from a lattice basis, we take a Markov basis for the set B (Diaconis and Sturmfels 1998). A 


Markov basis is a larger set of elements in kerz(A) that connects all hbers Ty irrespective 
of the given values in y. A hnite set Ai C keT^{A) is a Markov basis if, for any y such that 
Ay ^ 0 and for all elements * 1 , X 2 G Ay^ Xi ^ X 2 , there exist M > 0, Ui,..., vm G Ai and 
ei, ..., Cm G { — 1,1} such that 


M 

X2 = Xi + ejV 
j=i 


j and G L = 1,...,M —1. 

k=l 


The hrst condition says that we can use moves from a Markov basis as in the algorithm in 
Figure to move between any two elements of our hber. The second condition says that 
when moving between any two elements in the hber, we always remain in the hber (i.e. we 
never encounter a negative count). 

Although Markov bases are relatively easy to describe there is no simple algorithm for 


their computation. Diaconis and Sturmfels (1998) show how a Markov basis can be computed 


using techniques from commutative algebra. The theory is based on what is now known as 
the Fundamental Theorem of Markov Bases which describes how hnding a Markov basis is 
equivalent to hnding a set of generators of a toric ideal in a polynomial ring associated with 
the matrix A. We refer the interested reader to Cox et ah (2007|) for details on commutative 


algebra and to Diaconis and Sturmfels (1998), Drton et al. (2009), Aoki et ah (2012) and the 


references therein for additional information on the generation of Markov bases in algebraic 
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statistics. Unless otherwise stated, we use the freely available software 4ti2 (Hemmecke 


et al. 2013) to compute the Markov bases for the examples in this manuscript. 


For the 3x3 contingency table, a Markov basis consists of the nine elements in Q 



Xn 

3^21 

3^31 

3^12 

3^22 

3^32 

3^13 

3^23 

X 33 

MBl 

0 

0 

0 

0 

1 

-1 

0 

-1 

1 

MB2 

0 

0 

0 

1 

-1 

0 

-1 

1 

0 

MB3 

0 

0 

0 

1 

0 

-1 

-1 

0 

1 

MB4 

0 

1 

-1 

0 

-1 

1 

0 

0 

0 

MBS 

0 

1 

-1 

0 

0 

0 

0 

-1 

1 

MB6 

1 

-1 

0 

-1 

1 

0 

0 

0 

0 

MB7 

1 

-1 

0 

0 

0 

0 

-1 

1 

0 

MBS 

1 

0 

-1 

-1 

0 

1 

0 

0 

0 

MB9 

1 

0 

-1 

0 

0 

0 

-1 

0 

1 


It is a straightforward exercise to conhrm that we can transition between the two solutions in 
(|^ by adding or subtracting moves from (|^ one-at-a-time without encountering a negative 
count. More importantly, the moves in Q can be used to connect any two solutions in the 
same hber, no matter what value of y is observed. 

There is often a need to analytically hnd a Markov basis for a given problem. Even though 
tools like 4ti2 are freely available, computation of Markov bases remains challenging. As we 
discuss later, for many of the capture-recapture examples we have explored, 4ti2 can fail to 
compute Markov bases for studies with a moderate to large number of sampling occasions. 
As we know of no simple test to conhrm whether a specihed set of moves 13 is a Markov 
basis, we often need to rely on theoretically derived Markov bases to conhrm that our MCMC 
algorithms are valid. In the following section we hnd such a theoretical result for a class of 
capture-recapture models including Mta. 
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3 Model Mta and Simple Corruptions 


Here, we examine model the specific model of misidentification considered by L2010. 
We fit this model into a larger class of models in which any identification error results in 
what we refer to as a simple corruption. We then show that for any model in this class, 
we can construct a lattice basis that is guaranteed to connect every element of the fiber, 
irrespective of y, i.e. it is also a Markov basis. 

Model MtQ builds on the standard closed population model with time-dependent capture 


probabilities, model Mt of Otis et ah (1978), by allowing for individuals to be misidentified 
when captured. The model assumes that all errors are unique meaning that an individual 
cannot be identified as another individual and the same error cannot occur multiple times. 
The result is that an error on the capture occasion leads to a ghost observed history 
containing a single observation on the occasion. 

For this model, the vector of summary statistics, y, contains the counts of the 2^ — 1 
observable capture histories. The vector of latent variables contains the counts of the possible 
true histories constructed from the events: 


• 0 - the individual was not captured, 

• 1 - the individual was captured and correctly identified, 

• 2 - the individual was captured and incorrectly identified. 

For example, for a study with K = 5 capture occasions the true history 01221 would generate 
three observed histories: 01001, 00100, and 00010. Including the null history 0... 0, the 
vector of true counts has length 3^. The configuration matrix. A, has dimension (2'^ —1) x3'^ 
and Aij = 1 if the true history generates the observed history and is equal to zero 
otherwise. For example, the column corresponding to the history 01221 would contain three 
non-zero entries in the rows associated with the observable histories 01001, 00100, and 00010. 
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A description of the model along with the vectors x and y and matrix A for K = 2 are 
given in the supplementary materials, with more details in L2010. 


A feature of Mto, is that whenever an error in identihcation occurs, it involves only one 
individual and results in one or more observed histories. We dehne such an error as a 
simple corruption. For example, the errors in true history 01221 above affect no other true 
history and lead to three observed histories. Another example of simple corruptions are 
the errors that occur when multiple marks cannot be matched, as described in [Bonner and 


Holmberg (2013) and McClintock et al. (2013). Suppose that a study uses photographs to 


identify individuals and that photographs taken from the left or right side cannot be matched 
without further information. In this case, any individual that is photographed from both 
the left and right sides on different occasions will contribute two histories to the observed 
data set. Using the events L and R to denote photographs from the left and right, the true 
history OLRRL would generate observed histories OLOOL and 00i?i?0. In this case, each true 
history will contribute one or two histories to the observed data set. 

For a model that contains only simple corruptions, we have the following theorem: 


Theorem 1 Suppose that: (i) A contains only the values 0 and 1 and (ii) the columns 
of A contain all of the columns of the identity matrix. Then there exists a lattice basis 
C = {ui,..., Vm}, which is also a Markov basis. 

The hrst condition (values of 0 and 1) occurs under the assumption of simple corruption, 
while the second condition (columns of the identity matrix) occurs when every observable 
history is also a true history in which there is no misidentihcation. Provided these assump¬ 
tions hold, then we can use the algorithm in Figure with a suitable lattice basis £ and 
connect the hber. The proof of this theorem is provided in the supplementary materials, 
along with a description of how to construct the lattice (Markov) basis £. 

The conditions of Theorem 1 are satished for model Mta, so that for K = 2 we obtain 


13 











the Markov basis in ([^ 



^00 


^02 

^10 

Xii 

^12 

^20 

X 21 

X 22 

MBl 

1 

0 

0 

0 

0 

0 

0 

0 

0 

MB2 

0 

-1 

1 

0 

0 

0 

0 

0 

0 

MB3 

0 

-1 

0 

-1 

0 

1 

0 

0 

0 

MB4 

0 

0 

0 

-1 

0 

0 

1 

0 

0 

MBS 

0 

-1 

0 

-1 

0 

0 

0 

1 

0 

MB6 

0 

-1 

0 

-1 

0 

0 

0 

0 

1 


The basis in (|^ is identical to that presented by L2010 for model Mt^ when K = 2. 

The approach of L2010 to hnding a basis involves choosing pivotal (or constraining) 
variables when solving the set of equations Ax = 0 (a full description is available either in 
L2010, pg 180-181, or in the supplementary materials). L2010 chose specihc pivotal variables 
(xoi, xio and xn) when hnding the basis for model Mta when K = 2. However, it was implied 
that this choice was arbitrary and no guidance was given as to how to select pivotal variables 
when K > 2. It turns out that changing the pivotal variables can lead to different sets of 
basis vectors which may not be Markov bases. We show in the supplementary materials that 
for it' = 2 and a different set of pivotal variables, X 22 , 2:20 and xn, the resulting basis differs 
from that in (|^. We also show that when the conditions of Theorem 1 are satished, there is 
a specihc choice of pivotal variables guaranteed to return the Markov basis C. In particular, 
if we order x as in L2010 for model Mto and take the variable corresponding to the leading 
non-zero entry in each row of A as pivotal (as was done by L2010 for K = 2), the basis 
found will be the Markov basis C. 

Theorem 1 ensures that there is at least one lattice basis which is also a Markov basis 
for model Mtc,. However, it does not imply that every lattice basis is a Markov basis. For 
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model MtQ, and K = 2 another lattice basis (fonnd by hand) is given in (|^ 



^00 

Xoi 

X02 

Xw 

Xii 

3^12 

X20 

X21 

X22 

LBl 

1 

0 

0 

0 

0 

0 

0 

0 

0 

LB2 

0 

-1 

1 

0 

0 

0 

0 

1 

-1 

LB3 

0 

0 

1 

0 

0 

0 

1 

0 

-1 

LB4 

0 

0 

0 

1 

0 

0 

-1 

0 

0 

LB5 

0 

0 

-1 

0 

0 

1 

-1 

1 

-1 

LB6 

0 

0 

0 

0 

0 

0 

0 

1 

-1 


Snppose the observed data are y = (363, 22,174) (as in L2010), then two elements in the hber 
are Xi = (0,363,0, 22,174, 0,0, 0, 0)' and X 2 = (0, 361, 2, 22,174, 0, 0, 0,0)'. We are nnable to 
move between these two using LBl - LB6 in ([^ as moves in the algorithm in Figure]^ In 
particular, if we start at (the observed history) Xi the moves LB2, LB3, LBS and LB6 will 
lead to automatic rejections because they will always propose a negative value. This means 
that Xi and X 2 are not connected and thus the hber is not connected. 

We repeated the analysis of L2010 using both the Markov basis in (|^ and the lattice 
basis in (|^ using the same prior distributions as in L2010 (we used only one of the priors 
L2010 considered for a; a beta distribution with parameters 19 and 1). In both cases we 
implemented the algorithm in Figure [T] using Xi as the starting value with interest in the 
abundance N. We checked convergence via trace plots and plotted the resulting distribution 
for N\y in both cases (Figure]^. The lattice basis in 0 leads to a distribution for N that 
is substantially different from the true posterior distribution and could lead to incorrect 
decision making. 

[Figure 3 about here.] 

We note that efficiency gains can be made if there are observable histories with zero 
count. In particular, we can delete the entries in y and the rows of A corresponding to 
the zero counts before deleting any columns of A and corresponding entries of x that are 
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known to have zero count. Provided the assumptions of Theorem 1 are still satished by the 
resulting conhguration matrix then we can still find a set of moves guaranteed to connect all 
elements in the fiber. The resulting set of moves is no longer a Markov basis but a Markov 


subbasis (Chen et ah 2006) as it is only valid for the observed y. This corresponds to the 


approach taken by both Bonner and Holmberg (2013) and McClintock et al. (2013) for data 
with multiple marks that could not be matched. 

This section shows that we must take care even with simple corruptions to ensure that 
the lattice basis we are using is also a Markov basis. The following two sections give examples 
where we do not have simple corruptions (in one of these it does not even make sense to 
think of corruptions in the sense of model Mto) and a Markov basis has greater cardinality 
than a lattice basis. 


4 Example: Sufficient Statistics 


Next we consider the problem of modeling data from a closed population when sufficient 
statistics from one or more models are provided in place of the raw data. The raw data 
may not be available for a variety of reasons, e.g. privacy concerns. Here we assume that 
the population is closed and that we have the sufficient statistics associated with three 


commonly used models Mt, Mb and Mb (Otis et al. 1978). From model M^ we have the 


statistics fi,..., fx, where fj is the number of individuals who were caught j times from a 
total of K sampling occasions; from model Mt we have the statistics ni,... where rij 
is the number of individuals captured in the jth sample; and from model Mb we have the 
statistic M. = with Mj the number of marked individuals in the population in 

sample j. Note that we do not include the other sufficient statistics for model Mt and Mb 


noted by Otis et al. (1978) as they are deterministic functions oi fi,, fx- 


All of these statistics are linear functions of the data which means that this problem can 
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be expressed using the linear constraint in Q. In this example, x represents the vector of 
counts for the 2^ — 1 true histories; y represents the vector of counts for the 2K +1 sufficient 
statistics; and the conhguration matrix, A, is a {2K + 1) x (2^ — 1) matrix. Details of how 
to hnd A along with an example for a study with K = 4 occasions are provided in the 
supplementary materials. 

Here we explore this scenario using multi-list data from a South Auckland, New Zealand, 


diabetes study from the Ph.D. research of Huakau (2001) and included in the Ph.D. re 


search of Sutherland (2003). We ignore the potential errors in matching individuals between 


lists and assume that each individual is correctly matched (see |Lee| ( |2002[ ) for how such 
errors could also be accounted for using the linear constraint ([^). There are K = 4 lists: 
general practitioners records (G), pharmacy records (P), outpatient records (O) and inpa¬ 
tient discharge records (D) that we assume are ordered as written. We use the data for 


males and reduce the full data (which is available in Sutherland 2003) to the statistics: 
n = (nG,np,no,n^)' = (629,622,6279,1623)', / = (/i,/s,/s,/ 4 )' = (6030,1312,161,4)' 
and M. = 8680 to give 


y = (6030,1312,161,4, 629, 622, 6279,1623, 8680)'. 


As well as y being sufficient for models Mt, Mh and M^, it is also sufficient for the two-factor 


quasi-symmetric version of model Mth that is induced by a Rasch model (see Agresti 1994 
for details of this model). 

The vector x is indexed hy uj = {u;g,^p,^o,^d), where ojj = 1 denotes inclusion on list 
j with Uj = 0 otherwise, so that xnoi is the number of individuals on lists G, P and D and 
not on list O. Our focus here is to attempt to make inference about Xioooi the number of 
individuals who appear only in list G. We may also wish to £t a model to x for which y 
are not sufficient statistics. By dehnition, the resulting model would be nonidentihable, but 
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this does not necessarily mean that there is no information about parameters of this model, 
including the abundance N. The latent multinomial model can be used in either of these 
situations. 

A lattice basis found using the Hermite normal form is 
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Using the seven moves LBl - LB7 in the algorithm in Figure it is impossible to move 
between the two solutions Xi and X 2 

xi = (652,4865, 794, 253,18, 234, 62, 260, 26, 221, 67,19,0, 32,4)' 

X2 = (684,4901, 694, 253, 31,154,161,192,49, 365, 0,19,0, 0,4)'. 

If we are currently at X 2 , it is clear that all moves (except LB3) will lead to at least one 
negative cell count and will be automatically rejected. The vector LB3 can be used to update 
X 2 , but we are unable to get to Xi using LB3 alone. Again, we have at least two sets of 
elements in the hber that we can move within, but are unable to move between. 

A Markov basis for this problem can be constructed in 4ti2 and is made up of the 16 
elements given in the supplementary materials. Since (i) 4ti2 hnds a minimal Markov basis, 
and (ii) the cardinality of the Markov basis is larger than that of a lattice basis, we can be 
certain that a lattice basis can never be a Markov basis for this problem. Even though it 
is likely possible to construct another lattice basis that can move between Xi and X 2 there 
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will be either (i) another two elements in the hber that are not connected, or (ii) another 
two elements in the hber for a different y that we cannot move between with such a lattice 
basis. 

Here we ht model and run the algorithm in Figure with both the Markov basis given 
in the supplementary materials and the lattice basis specihed above (details of the model are 
given in the supplementary materials). We make use of the factorization theorem (e.g, see 

pg. 276) that states that a model f{x\9) with sufficient statistics 

y can be expressed as 

= 9 {x\y)h{y\ 0 ). 


Casella and Berger 2002 


A practical implication is that only g{x\y) is required if interest is in a function of x such as 
Xioooi and the parameters 6 = (iV,pi,... need not be specihed. A related implication 
is that if we do choose to update 6 the resulting chains will converge to the correct posterior 
[G\y] even if we (i) do not update x, or (ii) update x using a set of moves that is unable to 
connect the hber, such as the lattice basis above; provided we specify an appropriate MCMC 
sampler for 0. 

Using the lattice basis and starting at X 2 the resulting distributions for Xiooo are qualita¬ 
tively diherent from the posterior distribution found using the Markov basis even though the 
individual chains appear to have converged to the stationary distribution (Figure]^. The 
true value of a^iooo = 260 has some posterior mass when using a Markov basis (despite being 
in the tail). If we were to believe the results when using the lattice basis a:iooo = 260 is so 
far in the tail, we would conclude it has negligible posterior mass. 


[Figure 4 about here.] 
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5 Example: Band Misreading in Mark-Resight 


As a final example we consider a mark-resight model which allows for the possibility that 
individnals are misidentihed when resighted in the held. Imagine that there are Ki distinct 
occasions, on which researchers captnre a nnmber of nnmarked individnals, mark them, and 
release them back into the population. Along with that are a series of K 2 resighting occasions, 
on which the researchers conduct visual surveys to identify previously marked individuals. 
Data from the experiment consist of the observed resighting histories for each individual. If 
there were no errors then standard mark-resight models could be used to estimate survival 


or movement rates (e.g. Hestbeck et ah 1991); or abundance (e.g. McClintock et al. 2006). 


Suppose now that individuals may be misidentihed when they are resighted. In direct 
contrast to model MtQ,, which assumes that errors are unique and never match other indi¬ 
viduals, we assume that errors may be repeated and always match the identity of previously 
marked individuals. The justihcation for this assumption is that the available set of marks is 
known on each occasion when individuals are identihed by man-made marks instead of natu¬ 
ral markers (e.g., genotypes or photo-id). Erroneous sightings of marks which have not been 
released can then be identihed and removed from the data prior to the analysis. The only 
time an error cannot be detected and discarded is when one previously marked individual is 
misidentihed as another previously marked individual. We note that removal of erroneous 
sightings is only justihed when estimating survival. Removing erronous sightings when in¬ 


cluding unmarked individuals would lead to biased estimators of abundance (McClintock 


et al. 2014). 


For the remainder of the section, we assume that the capture and resighting occasions 
occur simultaneously so that K = Ki = K 2 . The true capture histories for each individual 
can now be constructed in terms of four possible events. On each occasion, individual i may 
be: 
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• not captured or resighted (event 0), 


• captured or resighted and correctly identihed (event 1), or 

• resighted and incorrectly identihed (event 2). 

Further to this, another individual may be resighted and incorrectly identihed as individual i 
(event 3). Events 2 and 3 represent false negative and false positive resightings. For example, 
the history 123 for individual i would indicate that i was captured and marked on the hrst 
occasion, was resighted and misidentihed on the second occasion, and that another individual 
was resighted and identihed as i on the third occasion of a study with K = 3 occasions. To 
simplify the example, we assume that individuals cannot be misidentihed when they are hrst 
captured and that multiple events involving the same individual cannot occur on a single 
occasion (e.g., it is not possible to resight i and incorrectly identify another individual as 
i on the same occasion). This assumption may be unrealistic in some situations and was 
made to make the approach tractable. Developing methodology to relax this assumption is 
ongoing research. 

For an experiment with K occasions, the model has (4^ —1)/3 possible true histories and 
the usual 2'^ — 1 observable histories. Further to this, there are K — 1 extra constraints that 
equate the number of false negatives and false positives (2s and 3s) on occasions 2 through 
K. As a result, A has dimension (2^ + K — 2) x (4'^ — l)/3 and a basis for keTz(A) has 
(4^ — l)/3 — (2'^ + K — 2) elements. 

To make this more concrete, we consider the specihc case of an experiment comprising 
K = 3 occasions. In this case, there are (4^ — l)/3 = 21 possible true histories, 2^ — 1 = 7 
observable histories, and 3 — 1 = 2 extra constraints on the number of false positive and 
negative resightings (2s and 3s) on occasions 2 and 3. Details of how to construct A along 
with X and y for a study with K = 3 capture occasions are provided in the supplementary 
materials. In this case, a basis for kerz(A) has 12 elements and the specihc lattice basis 
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obtained using the Hermite normal form is provided in the supplementary materials, along 
with the Markov basis, computed using 4ti2, that has 63 elements. 

To illustrate the problems that can occur with this model we hrst consider the analysis 
of a single (fake) data set. Suppose that each observable history is recorded one time so that 

y = (l,l,l,l,l,l,l). 

An exhaustive search conhrms that the hber dehned by y contains exactly 120 unique el¬ 
ements. However, the lattice basis given in the supplementary materials does not connect 
all of the elements in the hber. Instead, the lattice basis divides the hber into two distinct 
pieces including a large set of 87 connected elements; and a further set of 33 isolated elements 
which connect to nothing else. As a result, the distribution of the sample generated by the 
algorithm in Figure [fusing the elements of the lattice basis in the supplementary materials 
as moves will depend on the starting point. 

To show this, we have investigated the output from the algorithm in Figure [T] when using 
a lattice basis as our set of moves. We have chosen a starting point that lies in the largest 
part of the hber and connects with 86 other elements: 


xi = ( 1 , 0 , 1 , 1 , 0 , 1 , 1 , 0 , 0 , 1 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 )'. 


Assuming a multinomial distribution for [x\0] is not appropriate to account for the band 
misreading process and specihcation of a more complex [x\0] is ongoing research. As our 
goal is to show that a lattice basis is unable to connect the hber, we simplify the model by 
setting [x\0] oc 1. A valid sampler should then sample uniformly from the 120 elements in 
the hber. For comparison, we have also run a chain using the full Markov basis starting at 
Xi. As expected, the hrst chain visits 87 unique solutions and the second visits all 120. To 
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visualize the impact this can have on inference, Figure compares the distributions of the 
number of errors in the solutions identihed by each chain. Using the lattice basis, the hrst 
chain oversamples the solutions with too few errors, placing too much mass on solutions with 
one or two errors and not enough on solutions with three, four, or hve errors. In comparison, 
the distribution generated using the full Markov basis matches the true distribution of the 
number of errors in the 120 elements almost exactly. 

[Figure 5 about here.] 


6 Discussion 

Here we have presented examples of capture-recapture models that show the importance 
of using a Markov basis when sampling from a linearly constrained vector of counts. In 
particular, we have demonstrated the danger of using elements of a lattice basis as one-at-a- 
time moves in an algorithm as in Figure In many situations a set referred to as a Markov 
basis is needed to ensure we can move between various elements of the hber without passing 
through invalid (negative) counts. Even when a Markov basis is a lattice basis, we must take 
care because not every lattice basis is a Markov basis. 

For a given matrix A the need for a Markov basis over a simpler lattice basis depends on 
the lattice basis chosen, as well as the data observed. If we consider the lattice basis for the 
3x3 contingency table in section difficulties arose because our data had a row sum of 0. 
A related issue is that even when a lattice basis is unable to connect the hber, it may still be 
able to connect nearly all elements in the hber. In such a case, using a lattice basis may lead 
to a distribution that is an acceptable approximation of the true posterior distribution. This 
is especially the case if the elements of the hber that are not connected to the initial value 
are in areas of low probability in the model [x\0]. This can be seen in the example from 
Section]^ using the lattice basis and starting at the second starting value (Figure]^ right 
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panel) results in an estimated posterior density that is practically indistinguishable from the 
true posterior distribution (Figure]^. However, there is no guarantee that any given lattice 
basis will provide a good approximation to the hber. It is possible that even with multiple 
starting values we may choose values that only connect a small proportion of the hber. 

One important aspect that we have only briehy mentioned is the difficulty in constructing 
Markov bases. For the purposes of this manuscript we have overcome this difficulty through 


(i) analytical results, or (ii) the use of the software package 4ti2 (Hemmecke et ah 2013). 
While the latter is possible for the examples we explored, it is unable to evaluate a Markov 
basis for some capture-recapture examples with a moderate to large number of sampling 
occasions. For example, 4ti2 was unable to compute a Markov basis (on the lead authors 
work machine) for the band read error model in section [^for K > A. If we were to use 4ti2 
for model Mto (ignoring the theorem presented in section]^, 4ti2 was unable to compute 
a Markov basis for K > 5. The implication of this is that for an algorithm in the spirit of 
Figure to be implemented for problems not involving simple corruptions, methodological 
work is likely to be necessary to ensure a potential set of moves is a Markov basis. 

Several alternative algorithms and methods have been proposed for sampling from the 
hber that avoid the calculation of a full Markov basis. We anticipate that such approaches 
may be useful for a range of capture-recapture examples. These include independent sam¬ 


pling of elements of the hber (e.g., see Chen et ah 2005), extending the algorithm to allow 
limited travel through vectors x that contain negative values while using a set of moves that 
is not guaranteed to connect the hber (e.g., see Bunea and Besag||2000 ) and approaches that 
dynamically hnd a Markov basis as the algorithm runs (e.g., see Dobra||2012 ). While promis¬ 
ing, we expect these approaches will require adapting to the particular challenges faced in 
problems involving misidentihcation in capture-recapture data. 
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1: Initialize so that y = Ax'^ 

2: for i = 1 : n do 

3: Sample /c G {1,2,..., m} with equal probability 

4: Sample c G { —1,1} with equal probability 

5: Set CCcand = x'--^ + cak 

6: Calculate the metropolis acceptance probability: r = min ^1, j 

7: Accept ajcand with probability r (if accepted x'‘ = a^candi otherwise x^ = x^~^) 

8: end for 

Figure 1: Algorithm for updating the latent counts x. The value n is the number of iterations 
in the algorithm and the vectors B = {ai, a 2 ,..., o-m} are a subset of the kernel of A. 
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Posterior density: modei =M,o,, chain 1 


Posterior density: modei = SS, chain 1 




Posterior density: model =M,o,, chain 2 Posterior density: model = SS, chain 2 




Figure 2: Estimated posterior densities of a quantity of interest for model M^q, (left panel) 
and a multi-list model where summary statistics are presented in place of full data (SS; right 
panel). Within each model, the resulting density estimates are plotted separately from the 
output of two parallel MCMC algorithms (for each model) with different starting values. 
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Posterior density for N (Markov basis) 



N 


Posterior density for N (iattice basis) 



N 


Figure 3: Histograms of the estimated posterior density of N\y when using the Markov basis 
from ([^ (top) and the lattice basis from ([^ (bottom) when starting from Xi. 
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Posterior density for Xiooo (Markov basis) 
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Posterior density for Xiooo (lattioe basis) 
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Figure 4: Posterior densities of Xiooo when using the Markov basis from the supplementary 
materials (top) and the lattice basis specihed in section 4 (bottom) when starting at X 2 - 
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Figure 5: Distributions of the number of errors in the solutions sampled given the data 
y. The top histogram illustrates the distribution generated using the lattice basis with the 
starting value Xi. The bottom plot illustrates the distribution obtained using the full Markov 
basis with the same starting value. In each plot, the gray bars represent the distribution 
of the number of errors while the dashed bars represent the true distribution over all 120 
unique solutions. 
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